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EXECUTIVE  SUMMARY 


Hyperspectral  imaging  (HSI)  sensors  provide  imagery  with  hundreds  of  spectral  bands,  typ¬ 
ically  covering  VNIR  and/or  SWIR  wavelengths.  This  high  spectral  resolution  offers  promise  for 
many  applications,  but  it  also  produces  enormous  volumes  of  data,  which  may  be  problematic  for 
storage  and  transmission.  Lossy  compression  may  therefore  be  necessary,  but  application  perfor¬ 
mance  degradation  that  results  from  compression  is  of  concern.  This  report  documents  results  for 
a  spectral-spatial  lossy  compression  scheme  and  a  variety  of  applications:  normalized  difference 
vegetation  index  (NDVI),  integrated  column  water  vapor  (CWV),  and  background  classification. 

The  compression  scheme  first  performs  principal-components  analysis  spectrally,  then  dis¬ 
cards  many  of  the  lower-importance  principal-component  (PC)  images,  and  then  applies  JPEG2000 
spatial  compression  to  each  of  the  individual  retained  PC  images.  Two  different  rate-allocation 
methods,  which  select  the  spatial  compression  ratios,  are  considered. 

The  assessment  of  compression  effects  considers  general-purpose  distortion  measures,  such 
as  root-mean-square  difference.  It  also  examines  changes  in  NDVI  and  CWV  data  products  and 
proposes  statistical  tests  for  deciding  whether  compression  causes  significant  degradations  in  clas¬ 
sification  results. 
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1.  INTRODUCTION 


A  hyperspectral  imaging  (HSI)  sensor  gathers  radiance  imagery  over  a  wide  range  of  the  electromag¬ 
netic  spectrum  with  high  spectral  resolution.  The  sensor  produces  an  x  Ny  image  in  which  each  pixel 
has  Nc  «  100-200  spectral  bands.  Such  an  image  is  commonly  referred  to  as  an  1%  x  Ny  x  Nc  cube. 
For  example,  the  HYDICE  (Hyperspectral  Digital  Imagery  Collection  Experiment  [22])  airborne  HSI  sen¬ 
sor  produces  320  x  320  x  210  major  frames  of  16-bit  values,  with  spectral  coverage  from  400-2500  nm 
and  spectral  resolution  of  5-16  nm  per  band.  Several  major  frames  can  be  concatenated  to  form  a  larger 
cube;  i.e.,  four  major  frames  yield  a  sample  HYDICE  cube  of  dimensions  320  x  1280  x  210.  HSI  thus  pro¬ 
vides  both  spatial  coverage  and  rich  spectral  information,  which  facilitates  applications  such  as  background 
classification,  material  identification,  and  target  detection. 

On  the  other  hand,  the  sheer  volume  of  data  produced  by  an  HSI  sensor  creates  problems  of  storage 
and  transmission.  The  example  320  x  1280  x  210  HYDICE  cube  requires  153.8  MB  of  storage.  Lossless 
compression  can  usually  reduce  this  requirement  by  a  factor  of  2-3,  which  still  puts  the  storage  require¬ 
ment  in  the  tens  of  megabytes  and  may  be  insufficient  for  timely  transmission,  dissemination,  or  archiving. 
Lossy  compression  then  becomes  necessary.  Because  lossy  compression  discards  potentially  useful  infor¬ 
mation,  researchers  are  interested  in  evaluating  its  effects  on  application  performance  [20,  24,  26].  This 
report  employs  a  simple  but  effective  lossy  compression  scheme  and  studies  its  effects  on  a  number  of  ap¬ 
plications,  namely  normalized  difference  vegetation  index,  integrated  column  water  vapor,  and  background 
classification. 


1.1  SELECTION  OF  COMPRESSION  STRATEGY 

This  report  focuses  on  what  can  best  be  termed  “general-purpose  lossy  compression.”  To  place  the 
report  in  context,  this  section  discusses  the  matter  of  selecting  a  compression  strategy.  The  selection  of  a 
compression  strategy  involves  factors  such  as  whether  or  not  lossy  compression  can  be  tolerated,  the  re¬ 
quired  amount  of  lossless  or  lossy  compression,  and  the  intended  application  area.  Other  factors  include 
implementation  complexity,  onboard  or  ground-based  processing,  latency,  error  resilience,  cost,  and  stan¬ 
dardization. 

Figure  1  presents  a  decision  flowchart  that  considers  the  most  general  aspects  of  compression:  namely, 
lossless  or  lossy  compression,  the  end  application  for  the  decompressed  data,  and  the  distortion  measure 
should  lossy  compression  be  required.  The  other  factors  mentioned  above  certainly  also  play  a  significant 
role,  but  the  flowchart  addresses  the  basic  question  of  whether  lossy  compression  is  even  necessary,  and  if 
so,  whether  existing  or  novel  compression  techniques  are  needed. 

The  top  of  the  flowchart  posits  the  most  basic  question:  Is  lossless  compression  required,  and  is 
it  achievable  within  the  problem  constraints?  If  so,  lossy  compression  becomes  unnecessary.  However, 
problem  constraints  can  make  lossless  compression  impractical  or  impossible.  A  space-borne  HSI  sensor 
with  limited  downlink  capability  is  a  prime  example.  Even  though  the  sensor  can  gather  large  amounts 
of  data,  there  is  insufficient  bandwidth  to  transmit  all  of  the  data  to  the  ground.  In  such  situations,  lossy 
compression  becomes  necessary. 

Within  the  realm  of  lossy  compression,  other  decisions  follow.  The  second  question  asks  whether 
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Figure  1.  Flowchart  for  selection  of  compression  strategy. 

the  intended  application  and/or  data  products  are  already  known  and  will  never  change.  If  so,  then  a  triv¬ 
ial  solution  follows:  Simply  perform  the  application  processing  or  compute  the  desired  data  products,  and 
then  use  lossless  compression  of  the  results.  (This  solution  assumes  that  the  products  have  far  lower  stor¬ 
age/transmission  requirements  than  the  unprocessed  data.)  An  example  is  target  detection.  One  could  apply 
the  detection  algorithm  to  generate  a  binary-valued  image,  where  cleared  (0)  pixels  indicate  “no  target”  and 
set  (1)  pixels  indicate  “target”  This  binary  image  requires  far  less  storage  or  transmission  bandwidth  than 
the  HSI  data  from  which  it  was  derived. 

In  most  cases,  the  trivial  solution  is  not  a  realistic  choice.  Then  the  third  question  comes  into  play: 
Are  the  distortion  measures  used  in  traditional  lossy  compression  methods  sufficient?  If  so,  then  the  wealth 
of  knowledge,  experience,  and  implementations  of  traditional  lossy  compression  methods  can  be  employed. 
For  example,  an  enormous  amount  of  research  and  development  has  been  devoted  to  the  theory  and  practice 
of  lossy  compression  under  the  mean  square  difference  (also  called  mean  square  error)  distortion  measure. 
On  the  other  hand,  if  traditional  distortion  measures  do  not  apply,  then  new  ones  must  be  developed,  along 
with  suitable  lossy  compression  methods. 


1.2  APPROACHES  TO  LOSSY  COMPRESSION  AND  APPLICATION  PROCESSING 

Lossy  compression  followed  by  application  processing  is  a  two-stage  process,  as  depicted  in  Figure  2. 
The  first  stage  performs  lossy  compression  on  the  original,  noncompressed  HSI  cube,  denoted  by  X.  The 
compressed  version  of  the  cube  is  a  bitstream  (not  shown)  that  can  be  stored  or  transmitted  more  efficiently 
than  X\  decompression  of  the  bitstream  yields  a  decompressed,  approximate  cube,  denoted  byX.  In  the 
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Figure  2.  Lossy  compression  and  application  processing. 


second  stage,  application  processing  is  performed  onX.  As  a  result  of  lossy  compression,  X  is  not  an  exact 
replica  of  X;  the  inaccuracy  between  X  and  X  is  given  by  the  distortion  D(X,X)  >  0.  Such  distortion 
means  that  the  results  of  application  processing  onX  will  generally  differ  from  the  results  that  would  have 
been  obtained  if  processing  had  been  applied  directly  to  X .  Moreover,  good  application  performance  —  not 
storage  or  bandwidth  reduction  —  is  usually  the  ultimate  goal. 

We  identify  four  possible  approaches  to  this  problem.  In  the  conventional  approach ,  compression  and 
processing  are  designed  without  consideration  of  one  another.  Compression  attempts  to  minimize  D(  Xjt) 
subject  to  a  constraint  on  the  size  of  the  bitstream  for  the  compressed  cube.  Processing  is  designed  to 
operate  on  the  original,  noncompressed  data  X.  The  conventional  approach  is  based  on  the  idea  that,  ifX  is 
a  sufficiently  faithful  approximation  of  X ,  then  processing  X  should  produce  results  similar  to  those  from 
processing  X. 

In  the  backward  and  forward  approaches ,  one  of  the  two  stages  is  redesigned  to  accommodate  the 
other,  which  is  held  fixed.  Knowledge  about  the  fixed  stage  is  carried  backward  or  forward  (hence  the 
name)  into  the  other  stage.  In  the  backward  approach,  knowledge  about  an  existing  processing  stage  - 
intended  for  use  on  X  —  is  carried  backward  into  the  compression  stage,  which  is  altered  in  an  attempt  to 
make  X  “fit”  as  closely  as  possible  with  the  processing  method.  Conversely,  in  the  forward  approach,  the 
compression  stage  remains  fixed,  and  knowledge  about  its  behavior  is  carried  forward  into  the  processing 
stage,  which  is  modified  to  process  X  instead  of  X. 

Finally,  the  joint  approach  redesigns  both  stages  together  to  obtain  the  best  overall  performance. 
Some  realistic  constraints  must  be  introduced  to  prevent  this  approach  from  degenerating  into  a  trivial  solu¬ 
tion  (cf.  Section  1.1).  Consider  target  detection:  With  no  constraints,  a  trivial  joint  solution  simply  performs 
detection  on  X,  produces  a  binary  detection-indicator  image  X,  (e.g.,  a  pixel  in  X  equals  0  to  indicate 
“target  present”  or  1  to  indicate  “target  absent”),  and  then  compresses X  losslessly.  We  adopt  the  conven¬ 
tional  approach  for  several  reasons.  First,  the  backward,  forward,  and  joint  approaches  tend  to  be  highly 
specialized  for  a  single  application,  while  there  is  a  wide  variety  of  HSI  applications  (e.g.,  environmental 
monitoring,  background  classification,  material  identification,  target  detection,  and  anomaly  detection).  The 
assumptions  and  algorithms  employed  by  different  applications  also  vary  significantly.  The  conventional 
approach  means  that  the  compression  stage  can  be  implemented  separately  and  its  effects  then  evaluated  on 
a  number  of  applications.  Second,  the  nonconventional  approaches  often  introduce  additional  assumptions 
about  the  characteristics,  such  as  probability  distributions,  of  the  data  involved.  However,  characterization 
of  the  statistical  properties  of  HSI  data  remains  an  ongoing  challenge  [19,  29].  Finally,  lossy  compression 
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and  application  processing  are  each  sophisticated  fields  in  their  own  right,  and  the  associated  algorithms  are 
very  complicated.  The  nonconventional  approaches  can  become  intractable,  and  the  results  are  limited  to  a 
particular  application.  For  these  reasons,  most  current  work  on  examining  the  effects  of  HSI  compression 
on  applications  has  taken  the  conventional  approach,  and  we  do  so  in  this  report. 

1.3  UNDERCOMPRESSION  AND  OVERCOMPRESSION 

We  now  introduce  the  intuitive  notions  of  undercompression  and  overcompression.  Increasing  the 
amount  of  compression  discards  more  information,  which  may  degrade  application  performance.  As  more 
information  is  discarded  during  compression,  the  more  likely  application  performance  degradation  becomes. 
If  too  much  information  is  discarded  due  to  excessive  compression,  then  performance  becomes  unaccept¬ 
able.  Ideally,  we  would  like  to  compress  as  much  as  possible  while  still  maintaining  acceptable  performance. 
We  use  the  terms  undercompression  and  overcompression  to  describe  the  failure  to  reach  this  optimal  bal¬ 
ance. 


In  undercompression,  performance  remains  acceptable,  but  we  could  have  compressed  even  more. 
The  penalty  for  undercompression  is  excessive  storage  or  transmission  bandwidth.  While  such  wastefulness 
could  be  expensive,  if  we  realize  that  we  have  undercompressed  the  data,  we  might  be  able  to  compress  it 
further  or  increase  the  compression  ratio  in  subsequent  collections. 

In  contrast,  overcompression  results  in  both  unacceptable  performance  and  irretrievable  loss  of  in¬ 
formation.  Even  if  we  discover  that  our  application  is  underperforming  and  requires  data  with  greater 
accuracy,  there  is  no  way  to  recover  the  discarded  information.  The  penalty  for  such  a  mistake  could  be 
serious.  We  therefore  assume  that  undercompression  is  preferable  to  overcompression.  We  choose  to  err  on 
the  side  of  caution,  and  our  assessments  will  reflect  this  assumption. 


1.4  NOTATION 


We  expand  the  notation  here.  As  introduced  above,  we  denote  the  original,  noncompressed  cube  by  X 
and  a  decompressed  cube  by  X.  Let  Nx  and  Ny  denote  the  number  of  spatial  columns  and  rows,  respectively, 
where  a  spatial  coordinate  is  given  by  ( x ,  y ),  with  x  G  {1,2,...,  Nx },  y  G  {1, 2, ... ,  Ny}.  Similarly,  we 
denote  the  number  of  spectral  bands  by  7VC,  and  we  indicate  the  index  of  a  band  by  k  G  {1,2,...,  N(  }.  A 
single  data  value  in  X  is  indicated  by  X[x,  y,  k\.  Letting  x  =  (1,2,...,  Nx)  and  y  =  (1,2,...,  Ny ),  we 
use  X[x,  y,  k]  to  indicate  the  entire  Nx  x  Ny  image  of  data  values  in  the  A;th  band  of  X. 

With  k  similarly  defined,  the  notation  X[x,  y,k]  denotes  the  spectrum  associated  with  spatial  location 
(x,  y).  X[x,  y,  k)  is  sometimes  treated  as  a  column  vector  in  Nc -dimensional  Euclidean  space.  The  spectra I 
angle  between  two  spectra  u  and  v  is 


0(u,  v) 


(1) 


where  (•,  •)  denotes  the  usual  inner  product,  and  ||u||  =  y/ (it,  il) .  Spectral  angle  is  sensitive  to  differences 
in  the  spectral  shapes  of  u  and  vbui  insensitive  to  illumination  changes:  Scalar  multiplication  of  u  or  v  does 
not  change  0(u,v). 
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1.5  RATE  AND  COMPRESSION  RATIO 


For  any  given  data  representation  (e.g.,  A  or  X),  the  rate  of  the  representation  is  defined  as  rate  = 
(number  of  bits  required  by  the  representation )/(number  of  data  values  represented),  which  has  units  of  bits 
per  data  value.  For  hyperspectral  data,  one  often  employs  units  of  bits  per  pixel  per  band  (bpppb).  The  raw 
data  values  in  A"  are  typically  stored  using  D- bit  precision  (e.g.,  8-,  10-,  or  16-bit  integers),  so  the  rate  of  X 
is  B  bpppb.  Although  the  values  in  X  also  have  a  bitdepth  of  B ,  the  associated  bitstream  is  essentially  an 
identical  description  of  A.  Consequently,  the  rate  of  A  is 

^  bitstream  length  for  A  [bits] 

R  = - - - - - - —  bpppb.  (2) 

number  of  data  values  represented 


Similarly,  define  the  compression  ratio  r  by 

raw  size  of  A  [bits] 

r  =  - ; - . 

bitstream  length  for  A  [bits] 

Hence,  the  compression  ratio  is  a  dimensionless  number.  Finally,  R  and  r  are  simply  related  by 


(3) 

(4) 


1.6  DISTORTION  MEASURES 

The  preceding  section  discussed  the  rate  R  of  the  (de)compressed  dataA.  However,  A  only  approx¬ 
imates  A.  As  mentioned  in  the  introduction,  the  distortion  between  A  and  A  quantifies  the  inaccuracy  of 
the  approximation.  Most  lossy  compression  systems  are  designed  with  the  goal  of  minimizing  distortion  for 
a  given  rate. 

In  lossy  compression,  by  far  the  most  common  distortion  measure  is  the  mean  square  difference 
(MSD), 

MSD  (x,  x)  =  Y.T.  E  (*[*>!/.*]  -  X[x,y,k]f .  (5) 

x  y  c  x  y  k 

MSD  has  a  natural  interpretation  as  the  power  of  the  error  X—X.  Additionally,  its  differentiability  facilitates 
analysis,  and  it  is  easy  to  compute.  We  also  make  use  of  the  root  MSD  (RMSD), 

RMSD(X,X)  =  V/MSD(X,X).  (6) 

Although  our  lossy  compression  scheme  employs  MwSD,  we  can  compute  additional  distortion  mea¬ 
sures  to  see  how  they  relate  to  application  performance.  We  also  use  mean  absolute  difference  (MAD)  and 
mean  spectral  angle  difference  (MSAD),  defined  as 

MAD  (*,  x)  =  E  E  Z  !*[*’  V'  *1  -  *[*■  ?/’  |  ’  (7) 

x  y  c  X  y  k 

MSAD  (X,  X)  =  E  E  9  (*>’  2 /,  k])  .  (8) 

x  y  x  y 
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2.  COMPRESSION  SCHEME 

The  compression  scheme  that  we  are  evaluating  consists  of  the  cascade  of  spectral  compression  and 
spatial  compression.  Spectral  compression  is  accomplished  by  applying  principal-components  analysis 
(PCA)  in  the  spectral  dimension.  The  resulting  principal-component  images  (PC  images)  are  then  com¬ 
pressed  spatially  and  independently  using  the  JPEG2000  standard.  Other  researchers  [23,  13,  24,  20]  have 
implemented  similar  schemes.  In  this  section,  we  elaborate  on  the  main  parts  of  the  scheme,  which  is 
diagrammed  in  Figure  3. 

2.1  SPECTRAL  COMPRESSION  WITH  PCA 

In  the  signal-processing  literature,  PCA  is  commonly  referred  to  as  the  Karhunen-Ldeve  Transform 
(KLT).  The  details  of  PCA/KLT  are  well  documented  in  the  literature  [1 1],  so  we  only  briefly  review  it. 
Initially,  one  computes  the  spectral  mean  vector  A  and  spectral  covariance  matrix  Kxx  of  A.  Next,  Kxx 
is  factored  into  the  form  Kxx  =  4>A<I>T,  where  A  is  a  diagonal  matrix  whose  diagonal  entries  are  the 
eigenvalues  A^  of  Kxx  arranged  in  descending  order  (Ai  >  A2  >  •  •  •  >  Ayvc),  and  4>  is  a  unitary  matrix 
whose  columns  are  the  corresponding  eigenvectors  of  Kxx-  The  energy  of  X  is  then 

Nc 

energy  in  X  =  ^  A*.  (9) 

k= l 

Finally,  the  transform  matrix  <J>7  is  applied  to  each  spectrum  in  A"  to  create  a  PCA/KLT  cube  Y  via 

Y[x,y,k]  =  $T(x[x,2,,£]-X).  (10) 

The  A;th  band  Y[x,  y ,  A:]  is  called  the  A;th  PC  image. 

To  compress,  some  of  the  PC  images  in  Y  are  first  discarded.  The  retained  PC  images  may  then  be 
compressed  spatially  —  the  subject  of  Section  2.2  —  and  stored  or  transmitted  as  a  bitstream.  X  and  $ 
are  also  stored  or  transmitted  as  side  information  for  decompression.  We  assume  these  values  are  noncom- 
pressed,  Z?Sjde-bit  quantities,  so  they  require  Sjjde  —  (Nc  +  N% )Z?Side  bits. 

To  decompress,  the  bitstream  is  first  decompressed  to  obtain  approximations  of  the  retained  PC  im¬ 
ages,  and  any  discarded  PC  images  are  filled  with  zeroes  to  make  an  approximate  PCA/KLT  cubeA.  Then, 
with  the  aid  of  the  side  information  X  and  4>,  the  decompressed  HSI  cube  X  is  computed  via 

X[x,  y,  k]  =  y,  k]  +  X.  (11) 

PCA/KLT  has  a  number  of  appealing  properties  for  compression  and  signal  approximation.  First, 
since  4>  is  unitary,  energy  is  preserved.  This  property  is  useful  when  the  MSD  distortion  measure  is  used 
because  MSD(A,  A)  =  MSD(E,  Y).  Second,  PCA/KLT  decorrelates  X  spectrally:  the  spectral  covariance 
matrix  K\  y  —  A,  so  the  PC  images  are  uncorrelated,  which  is  convenient  if  they  are  to  be  processed 
independently.  Also,  the  variance  of  the  A:th  PC  image  is  c%  =  A&.  Third,  among  all  unitary  transforms. 
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Figure  3.  Block  diagram  of  compression  and  decompression. 


PCA/KLT  “packs”  the  maximum  amount  of  the  energy  of  X  into  the  first  PC  images.  The  percentage  of 
total  energy  contained  by  the  A;th  PC  image  Y[x,  y ,  k]  is 

percent  of  total  energy  from  A;th  PC  image  =  — ^ —  x  100%.  (12) 

Ee=  i  A, 

The  first  k  PC  images  thus  account  for  the  following  percentage  of  total  energy: 


percent  of  total  energy  from  first  A;th  PC  images 


-  x  ioo%. 


(13) 


For  many  kinds  of  multidimensional  data,  the  vast  majority  (e.g.,  >  99%)  of  the  energy  is  contained  in 
the  first  few  PC  images.  Figure  6  in  Section  4.1  shows  the  eigenvalues  and  the  individual  and  cumulative 
percentages  of  the  total  energy  for  an  example  HSI  cube.  Fourth,  for  an  A£ -dimensional  Gaussian  source 
and  MSD  distortion,  PCA/KLT  is  the  theoretically  optimum  transform  in  a  rate-distortion  sense. 


Despite  the  above  properties  of  PCA/KLT,  we  would  be  remiss  not  to  mention  some  of  its  potential 
drawbacks  [30].  First,  PCA/KLT  is  data-dependent,  so  4>  will  change  from  one  cube  to  another,  and.Y  and 
$  must  be  stored  or  transmitted  as  side  information.  Second,  <&T  does  not  have  any  special  structure,  so  the 
computational  complexity  of  PCA/KLT  is  O(N^).  In  contrast,  other  data-independent,  fast  signal  transfor¬ 
mations  (e.g.,  discrete  cosine  or  wavelet  transform)  require  no  side  information  and  have  complexities  of 
0{NC  logNc).  Third,  most  of  the  optimality  properties  of  PCA/KLT  are  associated  with  Gaussian  data  and 
MSD  distortion.  However,  HSI  data  do  not  usually  have  a  Gaussian  distribution  [19,  29],  and  MSD  may  be 
inappropriate  for  HSI  applications  such  as  classification  and  identification. 


Nevertheless,  PCA/KLT  remains  an  extremely  popular  and  effective  tool  in  the  remote-sensing  and 
lossy-compression  communities.  MSD  does  offer  an  analytically  tractable  distortion  measure,  and  it  is 
reasonable  to  expect  acceptable  application  performance  if  X  remains  faithful  to  X.  Also,  the  energy¬ 
preserving,  energy-packing,  and  decorrelation  properties  always  hold,  and  PCA/KLT  often  provides  good 
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compression  performance  for  non-Gaussian  data.  PCA/KLT  forms  a  crucial  part  of  many  HSI  compression 
schemes  [13,  20,  23,  24,  26],  including  ours. 


2.2  SPATIAL  COMPRESSION  WITH  JPEG2000 

Because  PCA/KLT  is  performed  only  in  the  spectral  direction,  the  PC  images  are  spectrally  un¬ 
correlated,  but  each  individual  PC  image  Y[x,  y,  k]  remains  spatially  correlated.  Spatial  compression  is 
therefore  applied  to  each  individual  PC  image  1 ;  our  compression  scheme  employs  JPEG2000,  a  state- 
of-the-art,  wavelet-based  image -compression  standard  [27].  JPEG2000  possesses  a  number  of  important 
properties.  First  and  foremost,  it  has  significantly  better  rate-distortion  performance  than  many  other, 
older  image-compression  methods,  including  JPEG.  Additional  features  include  progressive  transmission, 
random-access  and  region-of-interest  coding,  and  error  resilience.  The  details  of  the  standard  go  far  beyond 
the  scope  of  this  paper,  so  we  only  describe  the  basic  operations  here. 

Given  the  A;th  input  PC  image  Y[x ,  y ,  A:],  the  JPEG2000  compression  scheme  computes  a  two-dimensional, 
separable  discrete  wavelet  transform  (DWT)  W[x,y,k]  of  Y[x,y,k]\  our  implementation  uses  the  irre¬ 
versible  9-7  wavelet.  Next,  the  wavelet  coefficients  in  W[x,y,  k]  are  quantized  to  produce  the  quantized 
wavelet  coefficients  Wq[x,  y,  k\.  Each  subband  of  Wq[x,  y,  k]  is  then  partitioned  into  rectangular  code¬ 
blocks,  and  each  code-block  is  coded  independently  using  arithmetic  encoding  in  conjunction  with  an  op¬ 
erational  rate-distortion  optimization  algorithm  known  as  EBCOT  (embedded  block  coding  with  optimized 
truncation)  [27]. 

In  decompression,  arithmetic  decoding  and  dequantization  of  the  code-blocks  give  the  decompressed, 
quantized  wavelet  coefficients  Wq[x,  y ,  k].  Then  the  inverse  DWT  (IDWT)  is  applied  to  produce  the  approx¬ 
imated  PC  image  Y[x,  y ,  k]. 

2.3  RATE  ALLOCATION  AMONG  PC  IMAGES 

To  this  point,  we  have  described  the  two  main  compression  steps:  spectral  compression  via  PCA/KLT 
and  spatial  compression  of  the  individual  PC  images  with  JPEG2000.  Each  PC  image  Y[x,y,k\  can  be 
assigned  a  different  spatial  rate  7^pat^,  and  the  EBCOT  algorithm  ensures  that  it  will  be  spatially  compressed 
to  minimize  MSD(Y[if,  y,  k\,Y[x,  y,  A;])  for  the  given  value  of  /2sPat,A>  Thus,  if  the  overall  rate  RmT\  is 
given.2  we  must  still  choose  the  spatial  rates  {/?Spaa};  this  is  the  problem  of  rate  allocation . 


Equal-Rate  Allocation  The  first  method  uses  equal-rate  allocation.  Given  i^vr|,  one  simply  chooses  JV' 
(K  <  7Vr),  the  number  of  PC  images  to  retain,  and  sets  /2>pat,fc  to  be  the  same  for  all  of  them.  Because  of 
the  energy-packing  property  of  PCA/KLT,  the  first  PC  images  contain  most  of  the  energy  in  X,  so  we  keep 
PC  images  1 , 2, . . . ,  N'r  and  discard  the  remaining  ones.  The  spectral  compression  ratio  is  simply 


7’spec  — 


Nc 

n;: 


(14) 


'The  PC  images  are  spectrally  uncorrelated  hut  not  necessarily  spectrally  independent ;  however,  as  in  many  separable  compres¬ 
sion  schemes,  any  remaining  dependence  is  typically  assumed  to  be  weak  and  is  ignored. 

2 Recall  that  the  equivalent  overall  compression  ratio  is  rovri  =  B/R^r i. 
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It  follows  that  the  retained  PC  images  all  have  the  same  spatial  rate, 

^spat,A;  —  ^ovrl  i  k  =  1,  2, .  .  .  ,  Nc.  (15) 

There  is  nothing  optimal  about  this  method,  but  it  is  a  simple  approach  that  we  use  mainly  as  a  basis  for 
comparison  with  the  allocation  method  below. 


MSD-Optimal  Allocation  The  equal-rate  allocation  uses  the  same  rate  7^pa u  for  all  retained  PC  images. 
However,  the  variances  of  the  PC  images  can  differ  widely.  For  example,  the  first  PC  image  might  contain 
80%  of  the  energy,  and  the  fifth  only  1-2%;  less  distortion  should  result  by  making  7?pat,i  greater  than 
#spat,5-  This  section  reviews  a  common  procedure  for  selecting  {itljpat,*}  for  MSD-optimal  allocation . 

The  distortion-rate  function  D(R)  of  information  theory  [5]  gives  the  theoretical  lower  bound  D  on  the 
resulting  distortion  when  data  produced  by  a  source  are  compressed  at  a  rate  R.  For  TV  independent  sources 
with  respective  distortion-rate  functions  Dk(Rk),  the  problem  of  rate  allocation  [27,  30]  is  to  select  {/t*} 
to  minimize  the  overall  distortion  (1  /Nc)  Dk(Rk)  subject  to  the  rate  constraint  (l/TVr)  '}Tk  Rk  =  R^r\. 
Applying  the  Lagrange  multiplier  method  to  solve  this  problem  yields  the  equal-slope  or  Pareto  condition 
dDfJdRk  =  —7/*,  V/j,  where  the  optimum  Lagrange  multiplier  value  rf  >  0  is  chosen  to  satisfy  the  rate 
constraint  jRovri- 

For  the  purpose  of  rate  allocation  in  this  work,  we  use  IVLSD  distortion  and  assume  each  spectrum 
X[x,y,  k]  is  the  output  of  a  vector  Gaussian  source  with  mean  vector  X  and  covariance  matrix  K\x- 
Since  PCA/KLT  is  a  unitary  transform,  the  MSD  in  the  PCA/KLT  domain  equals  the  MSD  in  the  usual 
HSI  domain,  and  the  PC  images  Y[x,y,k\  represent  outputs  from  Nc  independent  Gaussian  sources  with 
respective  variances  ojj!  =  Xk.  With  this  assumption,  rate  allocation  can  be  accomplished  in  a  straightforward 
manner  using  a  bisection  search.  A  review  of  the  rate  allocation  problem  is  given  in  Appendix  A. 

The  assumption  of  Gaussian  data  is,  of  course,  a  crude  approximation.  Better  allocation  results  could 
be  achieved  by  modeling  the  distributions  of  the  PC  images  more  accurately.  Alternatively,  one  could 
implement  an  operational  allocation  method,  which  compresses  and  decompresses  each  PC  image  over  a 
range  of  rates,  tabulates  the  resulting  PCA/KLT-domain  MSDs,  and  then  performs  discrete  optimization. 
Such  methods  require  a  substantial  increase  in  complexity  and  were  therefore  not  used  in  this  study. 
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3.  DERIVED  DATA  PRODUCTS 


Remotely  sensed  HSI  data  can  he  used  to  characterize  the  environment,  both  on  the  ground  and  in  the 
atmosphere.  In  this  section,  we  review  two  data  products  derived  from  HSI  data. 


3.1  NORMALIZED  DIFFERENCE  VEGETATION  INDEX 


The  first  data  product  is  the  normalized  difference  vegetation  index  (NDVI).  NDVI  measures  spectral 
differences  around  the  red  edge  (red  and  NIR  wavelengths)  and  is  commonly  used  to  represent  the  health 
and  amount  of  vegetation  [28].  The  NDVI  is  defined  as 


NDVI  = 


p(NIR)  —  p(red) 
p(NIR)  H-p(red)’ 


(16) 


where  ^(NIR)  and  p(red)  represent  near-infrared  and  red  band  reflectances,  respectively. 


The  rationale  behind  (16)  can  be  understood  as  follows.  The  chlorophyll  present  in  healthy  vegetation 
absorbs  visible  light  in  the  red  wavelengths  and  reflects  infrared  radiation.  Consequently,  the  surface  spectra 
of  healthy  and/or  dense  vegetation  generally  exhibits  a  decrease  in  the  radiance  at  red  wavelengths  and  a 
large  increase  at  near  infrared  (NIR)  wavelengths.  The  spectra  of  other  land-cover  types  may  have  high 
values  in  the  NIR  or  low  values  in  the  red,  but  vegetation  tends  to  be  unique  in  having  this  combination. 
The  numerator  in  (16)  follows.  The  denominator  in  the  equation  normalizes  for  factors  such  as  slope  and 
changes  in  illumination  [17]. 


NDVI  takes  on  values  between  —1  and  +1.  Larger  positive  values  represent  increasingly  healthy  or 
dense  vegetation.  Values  near  zero  indicate  rock  or  bare  soil,  and  negative  values  are  associated  with  water, 
snow  and  ice,  or  barren  terrain.  NDVI  is  employed  to  distinguish  vegetation  from  other  land  cover  types 
and  to  assess  the  density,  health,  or  stress  of  vegetation.  It  also  finds  use  in  the  prediction  of  droughts  and 
identification  of  areas  where  the  risk  of  fire  is  high. 

In  this  work,  the  red  band  is  taken  as  the  average  intensity  from  626.19  nm  to  692.88  nm,  and  the 
near-infrared  band  from  779.14  nm  to  807.91  nm.  These  bands  correspond  to  the  spectral  resolution  of 
LANDSAT  bands  3  and  4,  respectively.  Additionally,  in  this  report,  NDVI  was  computed  using  radiance 
rather  than  reflectance  values.  Since  the  atmospheric  path  radiance  is  low  in  both  the  red  and  NIR  bands, 
atmospheric  correction  was  deemed  unnecessary  [25,  Ch.  5]. 


3.2  COLUMN  WATER  VAPOR 

The  second  data  product  considered  is  integrated  column  water  vapor  (CWV).  CWV  measures  the 
water  vapor  content  in  a  vertical  path  from  the  Earth’s  surface  to  the  sensor.  CWV  is  expressed  in  units  of 
grams  per  centimeters  or  more  often  just  centimeters  of  water  vapor  [7].  CWV  is  commonly  used  in  the 
atmospheric  compensation  process  to  define  the  atmospheric  transmission  due  to  water  vapor  [9]. 

Following  [7],  the  at-sensor  radiance  for  a  down-looking  airborne  sensor  can  be  expressed  as 

Lsensor  (A)  =  Lsun  (X)T(\)p(\)  +  Lpath(\), 


(17) 
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Figure  4.  Atmospheric  transmittance  of  water  vapor  for  nominal  atmospheric  conditions. 

where  Lsun{ A)  is  the  solar  radiance,  r( A)  (0  <  r( A)  <  1)  is  the  total  atmospheric  transmittance  from  the 
sun  to  the  surface  to  the  sensor,  p(A)  is  the  surface  reflectance,  and  Ipath(X)  is  the  path-scattered  radiance. 

Figure  4  displays  the  atmospheric  transmittance  due  to  water  vapor  across  the  VNIR/SWIR  for  nom¬ 
inal  atmospheric  conditions.  Water  vapor  in  the  atmosphere  absorbs  solar  and  surface-reflected  radiance  at 
wavelengths  near  720,  820,  910,  940,  1 140,  1380,  1880,  and  2180  nm  [7].  Around  these  wavelengths,  the 
atmospheric  transmittance  r( A)  displays  sharp  troughs.  The  wavelengths  associated  with  these  troughs  are 
called  water  vapor  absorption  bands.  The  figure  shows  two  absorption  bands  —  at  940  and  1 140  nm  —  in 
light  blue.  Typically,  r(A)  is  greater  than  0.90  or  even  0.99  outside  of  the  absorption  bands,  while  it  drops 
sharply  to  near  0.50,  0.20,  or  near  zero  in  the  absorption  bands.  The  spectrally  flat,  high-transmittance  bands 
adjacent  to  an  absorption  band  are  known  as  atmospheric  windows. 

The  surface  reflectance  /;(A)  of  natural  materials  generally  varies  smoothly  in  the  VN1R  region.  In 
particular,  p( A)  is  approximately  linear  across  the  water  vapor  absorption  bands.  If  the  transmittance  were 
not  attenuated  in  an  absorption  band,  then  the  at-sensor  radiance  L^ens0r{X)  would  be  approximately  linear 
between  the  atmospheric  windows  associated  with  the  absorption  band. 

CWV  is  estimated  based  upon  the  above  observations.  A  number  of  methods  for  estimating  CWV  ex¬ 
ist  [7,  8,  9,  14,  15],  so  we  describe  the  general  ideas  only.  First,  radiance  values  in  the  atmospheric  windows 
and  those  in  the  absorption  bands  are  used  to  estimate  a  pseudo-transmittance  f(A)  in  the  absorption  bands. 
This  step  exploits  the  linearity  of  /9(A)  described  above.  Second,  f(A)  is  compared  with  transmittances 
derived  from  radiative  transfer  models  with  differing  CWV  amounts.  The  estimated  CWV  is  then  extracted 
from  the  model  whose  transmittances  agree  most  closely  with  f(A). 

Some  methods  [7,  14]  employ  the  absorption  bands  at  940  and  1 140  nm,  while  others  [9]  use  an 
additional  absorption  band  at  910  nm.  The  other  absorption  features  are  usually  not  exploited,  either  because 
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they  do  not  attenuate  r(A)  sufficiently  or  because  their  attenuation  of  r( A)  does  not  vary  much  with  CWV. 
This  work  used  the  method  described  by  Griffin  and  Burke  [9]. 
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4.  EFFECTS  ON  DERIVED  DATA  PRODUCTS 


Experiments  were  performed  on  an  HSI  cube  from  AVIRIS  (Airborne  Visible  Infrared  Imaging  Spec¬ 
trometer).  AVIRIS  is  a  whiskbroom  HSI  sensor  with  614  pixels  in  the  cross-track  direction  and  spectral 
coverage  of  400-2500  nm  in  224  bands  with  a  nominal  spectral  resolution  of  10  nm/band.  The  data  was 
collected  from  an  airborne  platform  at  20  km  altitude  over  Moffett  Field,  located  near  the  San  Francisco  Bay 
in  California.  The  ground  sample  distance  for  this  data  is  approximately  20  m.  The  original  radiance  cube 
has  dimensions  of  614  x  512  x  224,  with  each  radiance  value  stored  as  a  16-bit  integer.  The  original  cube, 
designated  by  X ,  thus  occupies  134  MB.  Figure  5  shows  an  RGB  image  of  the  original  scene. 

X  was  compressed  using  the  method  in  Section  2  and  a  number  of  different  compression  settings. 
The  PCA/KLT  side  information  ( X  and  <I>)  required  Bsu\e  =  32  bits  per  value,  so  Sside  =  1612800  bytes. 
For  the  equal-rate  allocation  method,  N*c  was  set  to  5,  10,  or  20,  and  overall  compression  ratios  rovri  of  50, 
100,  or  200  were  chosen.  The  MSD-optimal  allocation  method  also  used  7oVri  —  50,  100,  and  200. 


Figure  5.  RGB  image  of  Moffett  Field  AVIRIS  cube. 


4.1  COMPARISON  OF  RATE  ALLOCATION  METHODS 

This  section  discusses  results  that  illustrate  the  differences  between  the  two  rate-allocation  methods. 
First,  the  upper  graph  in  Figure  6  shows  the  eigenvalues  \  associated  with  the  PC  images  Y[x,  y,  k]  after 
PCA  is  performed  on  X.  As  is  characteristic  of  PCA,  the  eigenvalues  appear  in  descending  order,  which 
shows  the  well-known  energy-packing  property  of  PCA.  The  lower  graph  in  the  figure  shows  the  percentage 
of  the  total  variance  of  X  formed  by  the  PC  images.  The  percentages  for  only  the  first  30  PC  images  are 
shown.  The  blue  curve  shows  the  individual  percentage  for  each  PC  image  (cf.  (12)),  and  the  red  curve 
shows  the  cumulative  percentage  (cf.  (13)).  The  first  PC  image  Y[x,  y,  1]  accounts  for  about  60  percent  of 
the  total  variance  in  X,  the  second  PC  image  Y[x,y,  2]  for  about  30  percent,  and  so  forth.  From  the  red 
curve,  the  first  five  PC  images  account  for  over  99%  of  the  total  variance  in  X. 

Figure  7  contains  three  graphs,  each  showing  the  resulting  spatial  compression  ratios  for  a  different 
overall  compression  ratio  rovri-  For  the  equal-rate  method,  the  graphs  show  essentially  horizontal  lines  since 
each  retained  PC  image  is  compressed  at  the  same  spatial  ratio  (cf.  (15)).  For  a  given  overall  compression 
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Figure  6.  Eigenvalues  and  variance  percentage  for  Moffett  Field. 
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ratio,  retaining  more  PC  images  means  that  the  retained  PC  images  must  be  compressed  more. 

However,  for  the  MSD-optimal  rate  allocation  method,  each  graph  shows  a  monotonically  increasing 
curve.  In  general,  the  MSD-optimal  allocation  method  compresses  the  first  2-3  PC  images  less  than  the 
equal-rate  method.  The  first  PC  image  is  spatially  compressed  the  least,  the  second  PC  image  is  compressed 
slightly  more,  and  so  forth.  The  trend  of  increasing  spatial  compression  ratios  agrees  with  the  fact  that 
higher-index  PC  images  contain  a  smaller  portion  of  the  total  variance  of  X  (cf.  Figure  6).  Compared  to  the 
lower-index  PC  images,  they  can  be  compressed  at  higher  spatial  compression  ratios  without  affecting  the 
overall  MSD  as  much. 

Next,  Figure  8  shows  two  different  distortion  measures,  RMSD(X,X)  and  MAD(X,  X),  plotted 
against  the  overall  compression  ratio.  The  connected  curves  show  the  trend  for  a  given  rate  allocation 
method.  The  three  unconnected  points  show  the  distortion  and  effective  overall  compression  ratio  when  only 
PCA  is  used,  and  the  retained  PC  images  are  not  compressed  spatially.  Under  both  RMSD  and  MAD,  the 
MSD-optimal  rate  allocation  method  performs  best.  Compared  to  using  PCA  only  (no  spatial  compression), 
the  MSD-optimal  method  yields  overall  compression  ratios  about  five  times  higher.  Compared  to  the  equal- 
rate  allocation  method,  the  MSD-optimal  method  produces  a  much  lower  distortion  at  the  same  overall 
compression  ratio. 

4.2  SPECTRAL  DISTORTION 

Figure  9  shows  the  average  power  in  X  versus  wavelength.  There  are  four  distinct  water  vapor  ab¬ 
sorption  bands  around  940,  1 100,  1370,  and  1850  nm.  The  green  bars  indicate  the  wavelengths  used  in  the 
computation  of  NDVI  (Section  3.1 ),  and  the  light  blue  bars  indicate  the  wavelengths  used  in  the  computation 
of  CWV  (Section  3.2). 

Finally,  Figure  10  shows  plots  of  the  ratio  of  the  average  power  to  the  average  MSD  versus  wavelength 
for  different  overall  compression  ratios  and  allocation  methods.  Not  surprisingly,  as  the  compression  ratio 
increases  from  50  to  200,  the  power-to-MSD  ratios  decrease  across  the  entire  spectrum.  Consistent  with 
Figure  8,  the  MSD-optimal  method  has  the  highest  power-to-MSD  ratio  in  almost  every  band  and  in  all 
three  graphs.  At  the  NDVI  wavelengths,  the  power-to-MSD  ratio  for  the  MSD-optimal  method  is  several 
decibels  higher  than  that  for  the  equal-rate  allocation  methods.  At  the  CWV  wavelengths,  the  MSD-optimal 
method  has  the  highest  power-to-MSD  ratio  when  rovri  =  50.  However,  when  7*OVri  =  100,  this  ratio  becomes 
comparable  to  that  for  the  equal-rate  method  with  =  10,  and  with  rovri  reaches  200,  this  ratio  falls  several 
decibels  below  the  ratio  for  the  equal-rate  method  with  N*c  =  10.  In  addition,  in  the  remaining  absorption 
bands  (near  1370  and  1850  nm),  all  of  the  allocation  methods  produced  comparable  power-to-MSD  ratios. 

These  observations  can  be  understood  by  considering  the  graph  in  Figure  9  and  the  allocation  meth¬ 
ods.  As  a  result  of  PCA,  the  errors  due  to  lossy  compression  are  distributed  across  all  bands.  The  absorption 
bands  —  particularly  those  around  1370  and  1850  nm  —  already  have  low  power,  so  they  show  corre¬ 
spondingly  lower  power-to-MSD  ratios  than  other  wavelengths.  With  regard  to  the  allocation  methods,  the 
MSD-optimal  method  attempts  to  minimize  the  MSD;  it  does  not  consider  the  power-to-MSD  ratio  in  any 
particular  band.  The  equal-rate  method  does  not  perform  any  optimization  at  all.  If  the  MSD  is  roughly 
evenly  distributed  among  all  bands,  the  low-power  bands  will  suffer  more  than  high-power  bands  in  terms 
of  their  power-to-MSD  ratios. 
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Figure  7.  Allocation  results  for  Moffett  Field. 
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Figure  8.  Distortion  vs.  compression  ratio  for  Moffett  Field. 
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Figure  9.  Power  vs.  wavelength  for  Moffett  Field. 


4.3  EFFECTS  ON  NORMALIZED  DIFFERENCE  VEGETATION  INDEX 


Figure  1 1  shows  the  NDVI  image  resulting  from  the  original,  noncompressed  HSI  data  cube.  The 
white-to-green  color  scale  shows  more  positive  values  of  NDVI  in  deeper  shades  of  green  and  smaller  or 
negative  values  in  white. 

NDVI  difference  images  with  and  without  compression  are  shown  in  Figure  12.  Each  column  com¬ 
pares  the  difference  images  for  different  rate  allocation  methods.  From  top  to  bottom,  the  methods  are: 
equal-rate  with  N'c  =  5,  equal-rate  with  N'c  =  10,  equal-rate  with  Nfc  =  20,  and  MSD-optimal.  Across  each 
row,  the  desired  overall  compression  ratio  increases  from  50  to  100  and  then  to  200.  The  color  bar  shows 
that  differences  closer  to  zero  are  white,  with  increasing  positive  and  negative  differences  shown  as  deeper 
shades  of  red  and  blue,  respectively. 

The  figures  show  that  the  equal-rate  method  with  =  5  has  the  greatest  NDVI  differences  across  all 
compression  ratios,  while  the  MSD-optimal  method  has  the  smallest  differences  at  each  compression  ratio. 
In  addition,  the  MSD-optimal  method  with  a  compression  ratio  of  100  has  differences  comparable  to  the 
equal-rate  method  with  Nfc  =  10  and  a  compression  ratio  of  50. 

Figure  13  shows  a  graph  with  the  MAD  between  the  NDVI  image  from  the  noncompressed  HSI  data 
and  the  NDVI  image  after  compression  for  a  variety  of  compression  settings.  In  addition,  the  unconnected 
points  show  performance  for  PCA  only  and  no  spatial  compression.  The  MSD-optimal  rate  allocation 
method  outperforms  the  other  allocation  methods.  In  agreement  with  the  previous  figure,  the  MSD-optimal 
method  with  7*0Vri  =  100  has  MAD  only  slightly  higher  than  the  equal-rate  method  with  7oVri  =  50.  Referring 
back  to  Figure  10,  we  see  that  the  MSD-optimal  method  maintains  a  higher  power-to-MSD  ratio  in  the  bands 
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Figure  10.  Power-to-MSD  ratio  vs.  wavelength  for  Moffett  Field. 
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Figure  1 1.  NDVI  image  for  Moffett  Field. 


used  for  computing  NDVI. 


4.4  EFFECTS  ON  INTEGRATED  COLUMN  WATER  VAPOR 

The  next  set  of  results  examines  the  effects  of  compression  on  the  CWV  product.  The  CWV  image 
for  the  noncompressed  HSI  data  appears  in  Figure  14,  where  the  blue  color  scale  shows  higher  CWV  values 
as  brighter  shades  of  blue.  Since  the  CWV  calculation  is  not  reliable  over  bodies  of  water,  they  have  been 
blocked  out  in  the  image. 

Figure  15  shows  CWV  difference  images.  The  top  row,  which  shows  results  for  the  equal-rate  al¬ 
location  method  and  JV'  =  5,  indicates  that  retaining  only  5  PC  images  is  likely  insufficient  regardless  of 
how  much  or  how  little  they  are  compressed  spatially.  Results  with  the  equal-rate  method  are  better  when 
10  or  20  PC  images  (second  and  third  rows)  are  retained.  The  MSD-optimal  method  performs  best  when 
rQvri  =  50;  at  this  compression  ratio,  it  retained  25  PC  images.  At  rovri  =  100,  the  MSD-optimal  method 
performs  comparably  to  the  equal-rate  method  with  =  10  and  slightly  better  than  the  equal-rate  method 
with  TV/.  =  20.  Finally,  at  r^vri  =  200,  the  equal-rate  method  with  N'c  =  10  has  the  best  performance,  and 
the  MSD-optimal  method  retained  only  5  PC  images  and  so  its  CWV  difference  image  looks  very  similar  to 
those  in  the  top  row. 

The  MAD  between  the  CWV  image  from  the  noncompressed  HSI  data  and  the  CWV  images  after 
compression  is  graphed  in  Figure  16.  The  individual  points  for  PCA  only  with  no  spatial  compression  show 
small  MAD  values  when  either  10  or  20  PC  images  are  retained  but  a  sharp  increase  in  MAD  when  only 
5  PC  images  are  retained.  The  connected  curves  show  results  when  spatial  compression  and  the  different 
rate  allocation  methods  are  applied.  With  an  overall  compression  ratio  of  fifty,  the  MSD-optimal  method 
and  equal-rate  method  with  =  10  or  AT'  =  20  perform  comparably.  As  the  overall  compression  ratio 
increases,  the  MSD-optimal  method  begins  to  perform  worse  than  the  equal-rate  methods. 

Recall  from  Figure  7  that  for  overall  compression  ratios  of  50,  100,  and  200,  the  MSD-optimal  method 
retained  28,  13,  and  5  PC  images,  respectively.  Hence,  at  a  compression  ratio  of  200,  the  MSD-optimal 
method  cannot  perform  any  better  than  only  using  PCA  and  retaining  the  first  5  PC  images.  In  addition. 
Figure  10  shows  that  in  the  bands  used  to  compute  CWV,  the  power-to-MSD  ratio  for  the  MSD-optimal 
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Figure  12.  NDV1  difference  images  for  Moffett  Field. 
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Figure  13.  NDVI  difference  vs.  compression  ratio  for  Moffett  Field. 


method  drops  below  that  for  the  equal-rate  method  with  =  10  and  N'c  =  20. 


4.5  DISCUSSION 

These  results  indicate  that  the  example  data  products  (NDVI  and  CWV)  can  tolerate  a  certain  amount 
of  lossy  compression.  They  also  show  that,  compared  to  using  PCA  alone  without  any  spatial  compression, 
PCA  and  spatial  compression  of  the  retained  PC  images  can  offer  significant  gains  in  compression  without 
degrading  results. 

In  addition,  the  importance  of  proper  rate  allocation  among  the  PC  images  is  evident.  In  general,  the 
MSD-optimal  rate  allocation  method  produced  better  results  than  the  equal-rate  allocation  method.  With 
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Figure  14.  CWV  image  for  Moffett  Field. 
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Figure  15.  CWV  difference  images  far  Moffett  Field. 


25 


Figure  16.  CWV  difference  vs.  compression  ratio  for  Moffett  Field. 


regard  to  distortion  measures  like  RMSD  and  MAD,  it  yielded  consistently  lower  distortions  than  the  equal- 
rate  method.  With  NDVI,  the  MSD-optimal  method  also  had  the  best  performance.  However,  with  CWV,  it 
did  not  always  perform  best. 

This  behavior  is  consistent  with  the  fact  that  MSD  distortion  is  an  average  measure  and  therefore 
does  not  consider  the  distortion  in  a  particular  spectral  band.  As  the  desired  compression  ratio  increases, 
the  MSD-optimal  allocation  method  retains  fewer  PC  images  and  spatially  compresses  them  more,  and 
PCA/KLT  distributes  the  distortion  across  all  bands.  NDVI  uses  bands  that  have  high  power,  and  therefore 
can  tolerate  some  distortion  before  the  power-to-MSD  ratio  becomes  too  low  and  the  NDVI  data  product 
becomes  degraded.  In  contrast,  CWV  employs  water-vapor  absorption  bands,  which  have  somewhat  lower 
power. 

The  equal-rate  methods  perform  no  optimization  in  their  selection  of  Nr  or  the  spatial  compression 
ratio  for  each  retained  PC  image.  Provided  that  rovri  and  Nj.  are  large  enough,  then  the  distortion  in  the 
absorption  bands  remains  small  enough  so  that  the  CWV  data  product  does  not  become  unacceptably  de¬ 
graded.  Note,  however,  that  there  is  no  way  to  predict  what  rovri  and  iV'  should  be. 
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5.  CLASSIFICATION  SCHEMES 

The  previous  section  considered  a  number  of  data  products  derived  from  HSI  data.  Another  important 
application  of  HSI  data  is  classification.  The  data  products  considered  above  yield  real-valued  images  of 
NDVI  or  CWV,  so  it  is  straightforward  to  examine  the  error  between  the  images  with  and  without  lossy 
compression.  However,  classification  assigns  a  discrete  label  to  each  pixel,  so  some  care  is  required  when 
evaluating  the  effects  of  compression.  The  next  few  sections  of  this  report  review  some  basic  classification 
schemes,  discuss  the  evaluation  of  classifier  performance,  and  then  study  how  compression  affects  classifi¬ 
cation  results. 

Given  an  Nx  x  Ny  x  Nc  HSI  cube  Z,  classification  is  the  process  of  assigning  each  pixel  Z[x,y,k] 
to  one  of  L  possible  classes.  The  L  classes  must  be  mutually  exclusive  and  totally  exhaustive  to  ensure 
that  each  pixel  is  assigned  to  one  and  only  one  class  [4].  Each  class  is  a  collection  of  pixels,  which  need 
not  be  spatially  connected  but  are  deemed  to  be  alike  in  some  way.  A  class  might  be  labelled  with  an 
associated  high-level  meaning  such  as  "road,”  "grass,”  or  "forest.”  Given  the  variability  of  natural  materials 
and  imaging  conditions,  such  high-level  designations  may  require  the  expertise  of  an  image  analyst.  With 
machine  classification,  a  class  usually  consists  of  pixel  spectra  that  are  close  together  in  some  quantifiable 
sense.  In  any  case,  classification  reduces  the  multidimensional,  variable  details  of  a  pixel  spectrum  to  a 
single,  discrete  designation.  The  result  of  classification  is  an  1%.  x  Ny  integer- valued  class  image  C,  where 
C[x,  y\  =  i  G  {1, 2, . . . ,  L}  indicates  that  the  pixel  Z[x,  y,k\  has  been  assigned  to  class  L 


5.1  CLASSIFICATION  ALGORITHMS 


In  this  work,  we  consider  two  simple  but  popular  supervised  machine  classification  algorithms:  the 
Euclidean  minimum  distance  (EMD)  classifier,  and  the  spectral  angle  mapper  (SAM).  The  algorithms  are 
supervised  in  the  sense  that  a  user  must  provide  them  with  a  spectral  library ,  a  set  C  =  {£ ,  s2, . . . ,  si }  of 
L  distinct  spectra  representing  the  different  classes. 

Central  to  both  classifiers  is  a  distance  measure  d(-,  •)  that  quantifies  how  close  a  spectrum  Z[x,  y<k] 
is  to  a  library  spectrum  S£  G  C.  EMD  uses  Euclidean  distance, 

d(u,  v)  =  ||?7  —  v\\. 


SAM  employs  spectral  angle. 


d(u,  v)  =  Q(u,  v). 


Given  the  cube  Z  and  spectral  library  £,  the  classifier  operates  as  follows:  For  each  spectrum  Z[x,  yjc],  the 
classifier  finds 

£*  —  argmin  d(Z[a;,  y, 


That  is,  £*  is  the  index  of  the  closest  library  spectrum.  Then  the  classifier  sets  C[x,  y]  =  t  to  make  the  class 
assignment.  Note  that  both  algorithms  operate  on  individual  pixel  spectra  only;  they  do  not  consider  spatial 
relationships. 


In  some  versions  of  these  classification  algorithms,  the  distance  d(Z[x,yJc],S£*)  must  also  be  below 
some  threshold;  otherwise,  the  pixel  is  labelled  as  unclassified  (C[x,y]  =  0).  This  option  was  not  exercised 
for  the  work  in  this  report. 
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These  classifiers  are  very  simple  but  nonetheless  worth  investigating.  Euclidean  distance  corresponds 
directly  to  MSD  distortion,  so  we  might  expect  a  simple  relationship  between  MSD  and  compression- 
induced  changes  in  EMD  classification.  In  contrast,  the  spectral  angle  between  two  spectra  can  be  very 
small  even  if  the  Euclidean  distance  between  them  is  large.  For  example,  consider  v  =  au  with  and  a  1; 
then  || u  —  v||  =  | a  —  l|||i?||,  but  0(u,  v)  =  0.  Consequently,  SAM  provides  a  case  where  the  distortion 
measure  used  for  compression  does  not  relate  to  the  distance  measure  used  for  classification. 


5.2  CONFUSION  MATRIX 

Classification  is  a  hard-decision  process;  each  pixel  must  be  assigned  to  exactly  one  class.  Pixels  will 
occasionally  be  assigned  to  the  wrong  class  for  a  number  of  reasons.  Natural  materials  possess  inherent 
variability.  Also,  the  spatial  area  being  imaged  by  a  single  HSI  pixel  might  contain  multiple  materials. 
Finally,  energy  reflected  or  emitted  from  nearby  materials  might  bleed  into  the  pixel  as  a  result  of  scattering. 
The  confusion  matrix  is  a  standard  tool  for  summarizing  the  correct  and  incorrect  assignments  generated  by 
a  classification  method  [4]. 

Let  CKf[x,  y]  be  a  truth  or  reference  classification  image.  Also,  let  C[x,  y]  be  the  class  image  returned 
by  a  classifier.  The  confusion  matrix  M  is  an  L  x  L  matrix  of  nonnegative  integers.  It  compares  the  actual 
or  assigned  class  image  C[x,y]  with  the  reference  class  image  QCf[x,y\.  Matrix  element  M{j  gives  the 
number  of  times  that  C[x,  y\  =  i  and  Cref[x,  y]  =  j,  i.e.,  the  number  of  times  the  classifier  assigned  a  pixel 
to  class  i  when  the  pixel  belongs  to  class  j  in  the  reference  class  image. 

Portions  of  the  subsequent  discussion  will  treat  C  and  Qef  as  realizations  of  two  ergodic,  multinomial 
random  processes,  C  and  Cref.  In  practice,  such  a  distinction  is  often  ignored.  However,  it  is  necessary  in 
the  development  of  statistical  tests  for  evaluating  classification  performance.  We  use  the  boldface  symbols 
to  distinguish  between  realizations  (C,  CJef)  and  stochastic  processes  (C,  Qef). 

We  quickly  review  some  quantities  commonly  associated  with  the  confusion  matrix.  First,  let  /prod(i) 
denote  the  producer's  accuracy  probability  for  class  j, 

P prodC?)  =  Pr(C[x,  y]  =  j|Cref[x,  y]  =  j). 

In  practice,  PprodO)  is  estimated  from  the  confusion  matrix  M.  Denote  the  sum  down  the  jth  column  of  M 
by  M.j  =  Yhi=\  Mij.  M.j  equals  the  true  number  of  pixels  from  class  j  that  are  present.  ThenPpr(Xj(j),  the 
estimated  producer’s  accuracy  probability  for  class  j,  is 


Pprod(j)  =  Pr(C[a:,  y]  =  j|Cref[x,y]  =  j)  =  — .  (18) 

•7  2—ji— 1  ij 

PProd  {j )  is  frequently  referred  to  as  the  producer's  accuracy  for  class  j.  It  tells  us,  of  all  the  reference  class-j 
pixels,  how  many  were  correctly  placed  into  class  j  by  the  classifier. 

Similarly,  the  user's  accuracy  probability  for  class  i  is  denoted  by  ijJserW* 

Puser(i)  =  Pr(Cref[x,  y]  =  i\C[x,y]  =  i). 
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This  probability  is  also  estimated  from  the  confusion  matrix.  The  sum  across  the  ith  row  of  M  is  given  by 
=  Ylj=i  Mij,  the  total  number  of  pixels  put  into  class  i  by  the  classifier.  The  estimate  of  li ser(z)  is  then 


Puserii)  =  Pr(C  ref[x,y]  =  i\C[x,y]  =  t)  =  ^ 


Mil 


Ei=i  Mij 


(19) 


Puser(z)  is  commonly  called  the  user's  accuracy  for  class  i ,  and  it  tells  us,  of  all  the  pixels  that  the  classifier 
placed  —  both  correctly  and  incorrectly  —  into  class  z,  how  many  are  tally  in  class  z. 


Finally,  the  overall  accuracy  probability  Pwx\  is  defined  as 

L 

Povri  =  Pr(C[a:,  y\  =  Cref[a:,  y])  =  ^  Pr(C[z,  y]  =  j,  Cref[x,  y)  =j).  (20) 

i= 1 

Povri  is  also  often  referred  to  as  the  probability  of  correct  classification  (i?c)-  Povri  can  be  estimated  from  M 
as  follows.  Let  M..  denote  the  sum  of  all  elements  of  M  (so  M.  =  NxNy).  Then  the  estimate  of  Povri  is 

L 

Povri  =  Pr(C[x,  y]  =  Cref[x,  y])  =  ^Pr(C[a:,y]  =  j,Cref[x,y]  =  j) 

j  =  1 


=  Pr(c[*>  V\  =  j|Cref[x,  y]  =  j)  Pr(Cref[x,  y]  =  j) 

i= i 

_  ’r'  Mjj  M.j  1 

~~  S  W]  m"  “  aT.  S 

j= i  3  j= i 


(21) 


where  Pr(Cref[a:,  y]  =  j)  is  estimated  via 


Pr(Cref[x,  y\  =  j)  —  — J-  =  — -  Mjj,  j  =  1, 2, . . . ,  L. 


(22) 


2—  1 


The  estimate  Povri  is  known  as  the  overall  accuracy . 


5.3  KAPPA  COEFFICIENT 


The  producer’s,  user’s,  and  overall  accuracy  probabilities  measure  different  aspects  of  classification 
performance.  Taken  individually,  none  is  sufficient  for  summarizing  the  performance  of  a  classifier  [4].  For 
a  particular  class,  the  producer’s  accuracy  probability  might  be  high,  but  the  user’s  accuracy  probability  low, 
and  vice  versa.  Also,  the  overall  accuracy  Povri  captures  only  the  diagonal  of  the  confusion  matrix.  Note 
that  the  probability  of  error  Pe  is  simply  PE  —  \  —  Povr i,  so  its  estimate  PE  =  1  —  Povri  only  includes  the 
diagonal  of  the  confusion  matrix  as  well. 


The  kappa  coefficient  K  provides  an  alternative  scalar  measure  of  classification  performance  [3,  4] 
that  compares  the  overall  accuracy  probability  against  the  probability  of  chance  agreement.  K  is  defined  as 


K  = 


Povri  Pchance 
1  —  -^chance 


(23) 
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where 


P :hance  —  ^  ^  Pr(C[x,y]  —  i)  Pr(Cref[x,  y]  —  i) 


(24) 


2—1 


is  the  probability  that  a  pixel  is  correctly  classified  by  chance. 


K ,  an  estimate  of  K,  can  be  computed  from  the  confusion  matrix.  First,  the  estimate  follows  the  form 
of  (23): 

Povrl  P :hance 


K  = 


1  P :hance 


(25) 


Second,  Povr i  is  estimated  from  M  as  in  (21).  Third,  France  is  estimated  by  combining  Pr(Cref[:r,  y\  =  j) 
in  (22)  with 

Pr(C[x,y]  =  *)  =  TJ1  =  t  =  l,2,...,L,  (26) 


j=  1 


so  that 


1  ^  _  i  L 

^chance  =  Pr(C[x,y]  =  *)Pr(Cref[x,  y]  =  i)  =  —  ^  Mj.M.j. 


2=1 


(27) 


"  t=i 


Finally,  the  resulting  maximum  likelihood  estimate  of  Ff  is  obtained  by  substituting  the  equations  forPovri 
and  Pchance  into  (25): 


K  = 


(28) 


M.2  -  E.ii  M-Mi 

Equation  (28)  is  the  expression  for  kappa  given  in  many  documents  (e.g.,  [21,  Ch.  6])  but  without  much 
explanation  of  the  associated  stochastic  interpretation. 

Unlike  Pprod(j),  PUser(*)»  anc*  Po\t\  (or  P#),  k  incorporates  both  the  diagonal  and  marginals  (column 
and  row  sums)  of  the  confusion  matrix.  It  does  not,  however,  take  into  account  the  producer’s  and  user’s 
accuracies. 

Some  intuition  into  the  form  (23)  of  K  can  be  gained  by  writing  Poxx\  as 

L 


Povri  =  y^Pr(C[a:,y]  =  i|Cref[x,y]  =  i)  Pr(Cref[x,y]  =  *). 


(29) 


i=i 


In  this  equation,  the  conditional  probability  Pr(C[x,y]  =  i|Q*f[a:,  y\  =  i)  reflects  the  dependence  of  the 
classifier  output  C[x,y]  upon  the  actual  reference  class  value  C^ffrr,  y].  If  the  classifier  accurately  labels  a 
pixel  as  class  i  when  the  pixel  truly  belongs  to  class  i,  then  Pr(C [x,y]  =  z‘|Qef[a:, y]  =  i)  should  be  close 
to  unity.  If  the  classifier  performed  perfectly,  then  Pr(C[x,  y]  =  z|Qef[x,y]  =  i )  would  equal  1  for  i  =  1, 
2, . . . ,  L,  and  Povri  would  equal  1. 

Next,  the  chance  agreement  probability  Chance  can  be  considered  the  probability  of  correct  classifi¬ 
cation  if  the  classifier  output  C[x,y]  were  conditionally  independent  of  the  reference  pixel  value  CJef[x,  y], 
i.e.,  if 

Pr(C[a:,y]  =  ?|Cref[a:,y]  =  i)  =  Pr(C[z,y]  =  *). 
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Then  (24)  can  be  written  as 


L 

France  =  ^  Pr(C[^2/]  =  «lCref  [x,y\  =  i)  Pr(Cref[^,  ?/]  =  i) 
i=l 
L 


=  ^Pr(C[:r,y]  =  i)  Pr(Cref[x, y]  =  i ). 

i=i 


C  and  Cref  conditionally  independent 


Conceptually,  PChance  is  the  probability  of  correct  classification  that  would  result  if  the  classifier  output 
C[x,y]  bore  no  relation  to  the  actual  class  value  Qef [x,y\-  It  can  also  be  thought  of  as  the  expected  prob¬ 
ability  of  correct  classification  if  the  class  labels  in  C[x,y\  were  assigned  randomly  using  the  distribution 

Pr(C[:r,  y]  =  z).3 

From  this  discussion,  even  if  the  classifier  output  C[x,y]  bears  no  relation  to  the  actual  class  labels 
Cref[#,  2/],  we  exPect  to  have  correct  classification  with  probability  Chance-  Also,  the  best  possible  proba¬ 
bility  of  correct  classification  is  unity.  In  (23),  the  kappa  coefficient  K  first  removes  the  bias  due  to  chance 
agreement  and  then  normalizes  the  result  /?)vr i  -  France  by  the  best  possible  result,  1  -  Chance- 

Values  of  K  close  to  unity  indicate  that  classification  is  better  than  chance,  while  a  value  of  zero  means 
the  classification  is  no  better  than  chance.  ( K  can  even  take  on  negative  values,  although  this  occurs  very 
rarely  and  indicates  that  the  classifier  is  performing  worse  than  chance.  In  such  a  case,  one  would  be  better 
off  simply  assigning  class  labels  in  C[x,y\  at  random,  rather  than  using  the  classifier.)  Some  authors  [16] 
have  also  suggested  the  following  scale  for  evaluating  the  agreement  between  C  and  Qe f. 


0.8  <  K  <  1 
0.4  <  K  <  0.8 
K  <  0.4 


strong  agreement; 
moderate  agreement; 
poor  agreement. 


5.4  KAPPA  ANALYSIS 


Kappa  analysis  is  a  method  for  determining  whether  or  not  classification  differences  are  statistically 
significant  [4].  In  the  stochastic  interpretation  of  kappa  analysis,  K  is  itself  a  random  variable,  andA"  in  (28) 
is  an  estimate  of  K.  Kappa  analysis  also  provides  an  estimate  o2^.  of  the  variance  of  K.  The  estimates  of 

K  and  a1^  can  be  used  to  provide  confidence  intervals  on  K.  We  repeat  the  expression  from  Congalton  and 
Green  [4,  p.  50]: 


0{(l -0i)  2(1-00(20102-03)  (l-  0,)2(04  -  40|)l 

(1-02)2  '  (1  -02)3  (1  -02)4 

where 


**  =  w: 


(30) 


=  m  e3  =  ±-2'£Mi,(Ml.+M.i),  (31) 


1=1 


M.2 


M.2 


'This  is  roughly  analogous  to  rolling  a  pair  of  fair  dice  many  times:  one-sixth  of  the  time,  “doubles”  will  occur,  even  though  the 
dice  do  not  influence  one  another. 
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and 


«<  =  ap££m^.+m,)2. 


(32) 


i=l  j= 1 


Statistical  significance  tests  can  be  applied  because,  under  the  multinomial  sampling  model  of  kappa 
analysis,  K  is  asymptotically  normally  distributed.  For  example,  if  K  =  0,  then  the  classifier  does  not 
perform  any  better  than  random  classification.  A  statistical  test  to  determine  whether  or  not  K  =  0  consists 
of  the  competing  hypotheses  [4]: 

H0  :  K  =  0; 

HX:K  ^  0. 

With  the  estimates  K  and  a2-,  the  appropriate  test  statistic  is 

A 


which  has  a  standard  normal  distribution.  Bq  is  rejected  if  Z  >  za/2  at  significance  level  a.  Otherwise,  Hq 
is  accepted. 


5.5  OTHER  STATISTICAL  MEASURES  OF  CLASSIFICATION  PERFORMANCE 

A  number  of  measures  of  classification  performance  similar  to  K  have  been  proposed.  Although  they 
are  not  used  in  this  study,  they  nonetheless  deserve  mention.  Like  K  in  (23),  these  other  measures  have  the 
form 

^ovrl  —  -^bias 
1-Pbias  ’ 

where  Povri  is  the  overall  accuracy  probability,  and  /foas  is  a  bias  probability  that  is  subtracted  to  eliminate 
the  influence  of  correct  classifications  that  occur  purely  by  chance,  /^vri  is  given  by  (20)  and  estimated 
using  (21 ).  The  classification  measures  differ  in  their  choice  of  the  bias  probability  ig jas. 

For  the  kappa  coefficient,  the  bias  probability  is 

L 

-Pbias  =  ^chance  =  ^  ^  Pr(C[x,y]  =  i)  Pr(Cref[x,  y]  =  i). 

2=1 

In  practice,  PChance  is  estimated  from  the  empirical  estimates  Pr(C[x,  y]  =  i)  and  Pr(Cref[:r,  y]  =  i)  of  (26) 
and  (22). 

Brennan  and  Prediger  [1],  Foody  [6],  and  Ma  and  Redmond  [18]  suggest  using  iftas  =  l/L.  Ma 
and  Redmond  argue  that,  when  Crd  is  generated  by  unsupervised  classification  methods,  there  is  no  prior 
knowledge  of  how  many  pixels  will  likely  be  assigned  to  each  class,  so  the  a  priori  probabilities  should  be 
the  same  for  all  classes;  i.e.,  Pr(Cref[x,  y]  =  i)  =  l/L ,  i  =  1,  2, . . . ,  L.  Consequently, 

1  1  1 

iaS  =  22  Pr(c[x>  v]  =  *)  2;  =  (33) 

i=  1 
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which  results  in  the  performance  measure 


_  ^ovrl  1/^ 

e  “  1-1  /L  ’ 


(34) 


Ma  and  Redmond  call  Te ,  “Tau  for  use  with  classifications  based  on  equal  probabilities  of  [class]  member¬ 
ship”  [18].  The  same  measure  results  in  the  case  of  supervised  classification  when  the  operator  supervising 
classification  assumes  Pr(Cref[x,  y\  =  i)  =  l/L.  The  estimated  value  ofTe  is  Te  =  {P0^r\-l/L)/(l-l/L). 

Ma  and  Redmond  also  suggest  a  generalization  in  which  Pr(Qef[x,y]  =  i )  is  estimated  from  other 
sources,  such  as  “a  previous  survey  of  group  population,  an  estimation  of  group  population  from  aerial 
photos,  or  an  arbitrary  decision  for  the  importance  of  groups”  [18].  In  this  case,  they  denote  the  measure  by 
Tp  =  (Povri  —  Pbias)/(1  —  -Pbias)*  with  the  corresponding  estimate 


TP  = 


P 3vrl  Pbias 

1  “  Pbias 


(35) 


where  Povr|  is  given  by  (21 ),  and 
L 


4 ias  =  £  Pr(C[rr,  y]  =  i)  Pr(Cref[z,  y]  =  i)  =  £  Af<.  Pr(Cref[x,  y]  = 


i=  1 


M.  ^  1 

i=i 


0, 


(36) 


Equations  for  the  variances  of  Te  and  Tp  and  test  statistics  for  evaluating  classifier  performance  appear 
in  [18]. 

Additional  measures  of  classifier  performance  that  are  related  to  kappa  are  “conditional  kappa”  and 
“weighted  kappa”  [4].  Conditional  kappa  is  useful  when  one  is  interested  in  the  classifier  performance  for 
a  particular  class.  Weighted  kappa  is  appropriate  when  certain  classification  errors  are  more  important  than 
others  and  can  be  weighted  accordingly. 


5.6  ONE-KAPPA  TEST  FOR  ASSESSMENT  OF  COMPRESSION  EFFECTS 

Kappa  analysis  as  described  in  Section  5.4  is  typically  used  to  determine  whether  or  not  a  classifier 
performs  significantly  better  than  chance  assignment,  or  whether  or  not  two  confusion  matrices  are  signif¬ 
icantly  different.  In  a  similar  way,  we  apply  kappa  analysis  on  one  or  two  confusion  matrices  to  evaluate 
the  (dis)similarity  between  classifications  before  and  after  lossy  compression.  These  tests  allow  one  to 
determine  whether  or  not  lossy  compression  has  had  a  significant  effect  on  the  classification  results. 

The  first  method,  or  one-kappa  test ,  involves  a  single  confusion  matrix  or  kappa  coefficient.  The  test 
assesses  the  agreement  between  classification  results  from  the  same  classifier  before  and  after  compression. 
A  classifier  is  first  applied  to  Ar,  the  entire  original,  noncompressed  cube,  and  the  resulting  class  image 
provides  C^g-  Then  the  same  classifier  is  applied  to  X,  a  decompressed  version  of  the  cube,  to  produce  a 
post-compression  class  image  Ccomp-  The  question  is,  “For  a  given  classifier,  does  compression  significantly 
change  its  results ?” 

To  answer  this  question,  we  obtain  the  confusion  matrix  and  kappa  coefficient  K  from  C^ng  and  Ccomp 
and  then  perform  kappa  analysis.  A  diagram  of  the  approach  is  shown  in  Figure  17.  We  assume  that  some 


33 


Figure  17.  Diagram  of  one-kappa  test. 


minimum  acceptable  kappa  coefficient  Km\n  G  [0, 1]  is  given;  Km\n  should  be  close  to  unity.  If  K  >  Km\n , 
then  we  declare  that  Ccomp  is  acceptable;  otherwise,  we  declare  that  Cco mp  differs  too  greatly  from  Cong. 
This  idea  yields  the  following  test  with  maximum  significance  level  a: 

Hq  :  K  =  Km\n,  (overcompression); 

H{  :  K  >  Km in,  (acceptable). 

The  test  statistic  is 

Zx  =  k~Km'\  (38) 

°k 

where  Z\  has  a  standard  normal  distribution  [4].  We  accept  H\  if  Z\  >  za  and  reject  H\  if  Z\  <  za. 

In  (37),  the  probability  of  Type  I  error  (accept  H\  when  Ho  is  true)  is  greatest  for  K  =  Km jn  and 
lower  for  K  <  Km\u  \  that  is,  Pr(accept  H\  \Hq  true)  =  a  for  K  =  Km\n,  and  Pr(accept  H\\Hq  true)  <  a 
for  K  <  Km\n.  Thus,  we  protect  ourselves  against  the  worst  possible  probability  of  Type  I  error  [12]. 
Such  an  error  corresponds  to  an  overcompression  error:  We  incorrectly  decide  that  the  post-compression 
classification  result  is  acceptable,  when  in  reality  lossy  compression  has  discarded  too  much  information.  In 
contrast,  a  Type  II  error  (accept  Hq  when  H\  is  true)  is  an  undercompression  error.  We  cannot  determine 
the  probability  of  Type  II  error  unless  we  replace  H\  with  an  equality  such  as  H\  :  K  =  K\,  and  then 
this  probability  only  holds  for  that  particular  value  of  K\.  Since  overcompression  may  have  more  serious 
consequences  than  undercompression  (cf.  Section  1.3),  we  have  arranged  the  test  to  limit  the  probability  of 
overcompression  error. 

5.7  TWO-KAPPA  TEST  FOR  ASSESSMENT  OF  COMPRESSION  EFFECTS 

In  the  one-kappa  test,  the  reference  Corvz  does  not  represent  the  truth;  rather,  (7orig  >s  just  the  class 
image  produced  by  the  classifier  without  compression.  Hence,  the  test  decides  whether  or  not  a  classifier 
remains  consistent  with  itself  following  compression.  Thus,  if  the  classifier  performs  poorly  in  the  absence 
of  compression,  the  test  determines  whether  or  not  the  classifier  continues  to  perform  poorly  after  lossy 
compression.  The  test  says  nothing  about  how  closely  the  class  images  agree  with  truth.  Even  without 
compression,  a  classifier  will  not  typically  classify  all  pixels  correctly.  This  fact  leads  us  to  a  two-kappa  test , 
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Figure  18.  Diagram  of  two-kappa  test. 


which  compares  two  kappa  coefficients  to  answer  the  question,  “ For  a  given  classifier,  does  compression 
significantly  reduce  its  agreement  with  truth ?” 

This  approach  appears  in  Figure  18.  For  this  test,  we  use  CJruth  to  emphasize  that  the  reference  class 
image  represents  truth.  First,  we  apply  a  classifier  to  X  to  produce  the  class  image  Qng  and  compute  the 
kappa  coefficient  Kovxg  from  Ctruth  and  Cong.  Next,  we  apply  the  same  classifier  to  X ,  obtain  the  post¬ 
compression  class  image  Ccomp^  and  determine  a  second  kappa  coefficient  Kc0m p  from  Ctrulh  and  Ccomp.  We 
assume  we  can  tolerate  some  maximum  degradation  AKlo\  >  0  after  compression  and  construct  a  test  with 
maximum  significance  level  a: 


Ho  :  Korig  -  Kcomp  =  AKl0|,  (overcompression); 

H\  i  Aorig  Acomp  ^  AATtoi,  (acceptable). 


(39) 


The  test  statistic  is 


r/  Aorig  A"comp  AA\0l 
Z2  =  - 


V5* 


+  aj> 

orig  ^  comp 


(40) 


where  dt<  and  d ^  are  the  estimates  of  the  standard  deviations  of  A^g  and  Kcomp.  Z±  has  a  standard 

normal  distribution  [4]. 


If  Zi  <  —  za,  we  accept  Hi;  otherwise,  we  reject  Hi.  Since  AA'toi  >  0,  the  test  also  accepts  F/j 
(with  maximum  significance  level  a)  in  the  case  where  A^omp  >  Aorig*  *  e.,  where  compression  improves 
classifier  performance;  such  behavior  could  occur  in  the  conventional  approach  because  lossy  compression 
can  have  a  denoising  effect  [2].  Finally,  like  (37),  (39)  is  posed  such  that  a  Type  I  error  corresponds  to  an 
overcompression  error. 
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6.  EXPERIMENTAL  RESULTS 


This  section  documents  results  for  an  HSI  cube  of  a  forest  scene  (Forest  Radiance,  run05)  in  the 
HYDICE  data  collection.  This  data  was  collected  from  an  airborne  platform  at  an  altitude  of  1.5  km;  at  this 
altitude,  the  ground  sample  distance  is  approximately  0.8  m.  The  HYDICE  sensor  returns  data  as  12-bit 
unsigned  integers.  However,  calibration  increases  the  dynamic  range  and  occasionally  produces  negative 
values  [13];  hence,  the  radiance  values  are  stored  as  16-bit  signed  integers. 

The  initial  radiance  cube  had  dimensions  of  320  x  1280  x  210  but  was  modified  prior  to  the  com¬ 
pression  experiments.  Spatially,  the  ten  leftmost  and  rightmost  columns  were  discarded  because  they  cor¬ 
responded  to  all-zero  pixels  at  the  edges  of  the  sensor  focal  plane  array.  Spectrally,  bands  corresponding  to 
the  water-absorption  wavelengths  were  eliminated.  This  preparation  step  yielded  an  original  cube  X  of  size 
300  x  1280  x  145,  requiring  106  MB  (111.36  x  106  bytes)  of  storage.  All  rates  and  compression  ratios  are 
reported  relative  to  this  cube.  The  actual  dynamic  range  of  X  was  —121  to  12911. 

The  compression  scheme  described  in  Section  2  was  then  applied  to  X  for  a  variety  of  compres¬ 
sion  settings.  The  PCA/KLT  side  information  ( X  and  4>)  required  Bs\fe  =  32  bits  per  value,  so  SSH\e  = 
84680  bytes.  The  equal-rate  allocation  method  used  =  145,  50,  20,  15,  10,  and  5,  and  overall  compres¬ 
sion  ratios  rovri  ranging  from  14.5  up  to  about  340.  Recall  that  for  this  method,  with  7^vri  and  N'c  given, 
rspat,A:  =  ( / Nc ) / T’ovri  for  each  retained  PC  image.  The  MSD-optimal  allocation  method  used  the  same 
values  of  rovri. 

Figure  19  shows  a  graph  of  the  eigenvalues  for  the  modified  radiance  cube.  Like  Figure  6  for  Moffett 
Field,  the  first  few  eigenvalues  have  much  larger  values  than  subsequent  eigenvalues.  Next,  Figure  20  shows 
an  example  of  the  allocation  results  for  rovri  =  30.  The  equal-rate  allocation  method  compresses  each 
retained  PC  image  at  almost  the  same  spatial  compression  ratio,  while  the  MSD-optimal  rate  allocation 
method  compresses  the  first  few  PC  images  very  little  and  increases  the  amount  of  spatial  compression  for 
the  later  PC  images.  Finally,  Figure  21  shows  the  values  of  Af.  selected  by  the  MSD-optimal  method  as 
a  function  of  rovri.  Note  also  that  for  each  choice  of  A£,  the  retained  PC  images  were  compressed  with 
different  spatial  compression  ratios. 

Figure  22  shows  RGB  images  for  a  portion  of  the  original  cube  X  and  several  decompressed  cubes. 
At  rovri  ~  30,  the  decompressed  cubes  look  very  similar  to  X.  At  rovri  ~  340,  compression  artifacts  are 
obvious  for  the  equal-rate  allocation,  less  so  for  the  MSD-optimal  allocation. 

In  order  to  generate  an  initial  set  of  classes,  we  employed  the  ISODATA  unsupervised  clustering  al¬ 
gorithm  [10].  ISODATA  begins  by  randomly  choosing  a  number  of  initial  mean  spectra  as  cluster  centroids; 
then  an  iterative  procedure  follows:  First,  each  pixel  in  X  is  assigned  to  the  cluster  whose  centroid  is  closest 
in  a  Euclidean-distance  sense.  Second,  the  standard  deviations  of  the  resulting  clusters  are  then  computed, 
and  clusters  with  large  standard  deviations  are  split.  Third,  the  centroids  of  the  resulting  clusters  are  com¬ 
puted,  and  clusters  whose  centroids  are  close  together  are  merged.  Finally,  the  procecedure  is  repeated  until 
it  converges,  producing  L  clusters.  We  denote  the  class  image  associated  with  the  final  clustering  by  Go, 
and  the  corresponding  set  of  class  means  by  Go-  A  portion  of  Gso  appears  in  Figure  23,  along  with  the 
corresponding  RGB  image.  For  the  given  cube,  ISODATA  produced  15  clusters,  which  served  as  the  basic 
classes:  In  the  following  classification  results,  the  number  of  classes  is  L  =  15. 
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6.1  DISTORTION  RESULTS 


Figure  24  shows  plots  of  both  RMSD(Ar,  A)  and  MSAD(A,  A)  versus  the  overall  compression  ratio.4 
The  curves  associated  with  the  equal-rate  allocation  scheme  are  marked  with  the  corresponding  value  of  fy. 
The  curve  corresponding  to  the  MSD-optimal  allocation  is  marked  “MSD-opt.” 

As  expected,  the  MSD-optimal  allocation  has  substantially  lower  RMSD  over  the  entire  range  of  7oVri- 
With  regard  to  MSAD,  the  MSD-optimal  allocation  has  the  smallest  distortion  for  rovri  <  200,  while  the 
equal-rate  allocation  with  N*c  =  5  is  slightly  better  for  larger  values  of  rovri-  Recall  that  for  these  large  values 
of  rovri,  the  MSD-optimal  allocation  retained  only  3-5  PC  images. 


6.2  CLASSIFICATION  RESULTS 

To  prepare  for  the  one-kappa  test,  we  adopted  £so  from  ISODATA  as  our  spectral  library.  We  applied 
either  the  EMD  or  SAM  classifier  to  A,  and  the  resulting  class  image  served  as  the  reference  class  image 
Cong;  the  ISODATA  class  image  CiSO  was  not  used  with  this  test.  For  each  decompressed  cube  A,  the  same 
classifier  was  again  applied  to  produce  a  post-compression  class  image  Qomp.  Lastly,  K  was  determined 
from  Corig  and  Cc0mp-  Figures  25  and  26  show  portions  of  the  class  images  produced  by  the  EMD  and 
SAM  classifiers,  respectively.  Consistent  with  Figure  22,  Ccomp  is  similar  to  Corig  at  rovri  =  29  for  both 
allocations,  but  many  differences  appear  for  the  equal-rate  allocation  at  ^Vri  =  340. 

The  test  was  performed  with  a  =  0.01  and  Km\n  =  0.95.  Figure  27  shows  the  results  for  EMD 
and  SAM  classification  as  7ovri  is  varied.  The  100(1  —  a)%  confidence  intervals  associated  with  A"  were 
extremely  narrow  and  are  not  shown.  The  dotted  line  indicates  A^jn-  If  K  lies  entirely  above  this  line,  then 
we  decide  that  the  classification  differences  are  insignificant.  For  EMD  classification,  the  MSD-optimal 
allocation  results  in  considerably  less  performance  degradation  than  the  equal-rate  allocations.  It  does  not 
cause  significant  degradation  provided  that  rovri  <  200,  while  the  equal-rate  allocations  yield  acceptable 
results  for  7'ovri  <  105.  For  SAM  classification,  the  MSD-optimal  allocation  again  produced  better  results 
than  the  equal-rate  allocation,  but  the  amount  of  compression  that  could  be  tolerated  was  lower. 

For  the  two-kappa  test,  we  again  used  £*s 0  as  the  spectral  library,  but  we  also  employed  Qso  as  the 
truth  class  image;  i.e.,  CUth  =  C\so.  Recall  that  a  portion  of  C\so  appears  in  Figure  23.  Either  the  EMD 
or  SAM  classifier  was  applied  to  A  to  generate  CG rig,  and  A0ng  followed  from  C\so  and  Corig.  For  each 
decompressed  cube  A,  we  again  performed  classification  to  obtain  Qomp  and  determined  ATcomp  from  CjSO 
and  Ccomp* 

Figure  28  shows  the  results  with  a  =  0.01  and  AA,0i  =  0.05.  Again  the  confidence  intervals  were 
narrow  and  are  omitted.  We  decided  that  the  classification  differences  are  not  significant  wheneverAT0rig  - 
ATcomp  lies  entirely  below  the  dotted  line  for  AAi0i.  For  EMD  classification,  AT0rig  =  0.9553,  indicating  that 
Corig  agreed  very  closely  with  Cjso  (cf.  Figures  22  and  25).  With  compression,  the  MSD-optimal  allocation 
outperformed  the  equal-rate  allocations  again;  it  caused  no  significant  change  in  classification  performance 
up  to  7’ovri  =  305.  For  SAM  classification,  AT0ng  =  0.5297,  which  indicates  moderate  agreement  between 

4Peak  signal  to  noise  ratio  (PSNR)  is  often  used  instead  of  RMSD.  However,  the  calibrated  radiance  data  usually  span  only  a 
portion  of  the  possible  dynamic  range,  which  makes  computation  of  PSNR  difficult  [13],  so  we  do  not  report  it. 
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Cong  and  CjS0  (cf.  Figures  22  and  26).  This  result  is  not  too  surprising  because  ISODATA  employs  Euclidean 
distance,  but  SAM  uses  spectral  angle.  The  MSD-optimal  allocation  resulted  in  the  fewest  differences  and 
did  not  cause  significant  degradation  over  the  entire  range  of  rovri  tested.  Here,  the  equal-rate  allocation  also 
performed  well  for  N'c  =  5. 

6.3  DISTORTION  AND  CLASSIFICATION  RESULTS 

The  graphs  in  Figures  24,  27,  and  28  also  show  that  both  distortion  and  classification  differences 
change  monotonically  as  7oVri  increases.  Such  behavior  suggests  that  distortion  might  be  used  to  anticipate 
classification  degradation. 

An  example  appears  in  Figure  29.  For  the  MSD-optimal  allocation,  EMD  classification  degrades  with 
increasing  RMSD  and  becomes  unacceptable  for  RMSD  >  65;  SAM  classification  remains  acceptable  until 
RMSD  >  45.  Similar  behavior  for  the  two-kappa  test  can  be  observed  in  Figure  30. 


Figure  19.  Eigenvalues  for  Forest  Radiance.  Only  the  first  eighty  eigenvalues  are  shown. 
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Figure  20.  Example  allocation  results  for  Forest  Radiance  for  an  overall  compression  ratio  of  30. 


MSD-Optimal  Allocation  Results 


Figure  21.  Number  of  PC  images  retained  by  MSD-optimal  allocation  for  Forest  Radiance. 
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|JA0^  001  ~  |JA0“t  Ot'C  ~  |JA0- 


(kept  59  PCs) 


(kept  8  PCs) 


(kept  3  PCs) 

Figure  22.  RGB  images  of  a  portion  of  the  original  and  decompressed  Forest  Radiance  cubes. 
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ISODATA  result 


Figure  23.  Original  RGB  image  and  ISODATA  class  image  Ci.s„for  Forest  Radiance. 
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Figure  24.  Distortion  vs.  compression  ratio  for  Forest  Radiance. 


43 
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(kept  8  PCs) 


(kept  3  PCs) 

Figure  25.  Class  images  for  EMD  classifier  and  Forest  Radiance. 
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Figure  26.  Class  images  for  SAM  classifier  and  Forest  Radiance. 
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Euclidean  Minimum  Distance  Classifier 


Spectral  Angle  Mapper 


Figure  27.  One-kappa  test  v.v.  compression  ratio  for  Forest  Radiance. 
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Figure  28.  Two-kappa  test  vs.  compression  ratio  for  Forest  Radiance. 
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Euclidean  Minimum  Distance  Classifier 


Spectral  Angle  Mapper 


Figure  29.  One -kappa  test  vs.  RMSD  for  Forest  Radiance. 
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30.  Two- kappa  test  vs.  RMSDfor  Forest  Radiance. 
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7.  SUMMARY  AND  CONCLUSIONS 


We  have  demonstrated  the  need  for  lossy  compression  of  HSI  data  and  discussed  strategies  and  ap¬ 
proaches  to  the  problem.  We  have  adopted  the  conventional  approach,  in  which  traditional  compression 
methods  and  traditional  application  processing  algorithms  are  employed.  Other  approaches,  while  poten¬ 
tially  beneficial,  tend  to  be  specific  to  a  single  application  and  can  become  unwieldy  given  the  complexities 
of  both  the  compression  scheme  and  application  processing. 

We  have  examined  some  effects  of  lossy  compression  of  HSI  data  for  data  products  and  background 
classification.  The  compression  scheme  consists  of  spectral  compression  via  PCA/KLT  followed  by  spatial 
compression  of  the  PC  images  via  JPEG2000.  Data  products  considered  were  the  normalized  difference  veg¬ 
etation  index  (NDVI)  and  integrated  column  water  vapor  (CWV).  To  assess  compression-induced  changes 
in  classification  results,  we  proposed  two  statistical  tests  based  on  kappa  analysis;  the  tests  allow  us  to  limit 
the  probability  of  overcompression  error,  which  we  deem  more  serious  than  an  undercompression  error. 

The  experimental  results  demonstrated  the  importance  of  spatial  rate  allocation  among  the  PC  images. 
The  equal-rate  allocation  method  does  not  consider  distortion  at  all.  It  simply  compresses  a  selected  num¬ 
ber  of  PC  images  at  the  same  spatial  compression  ratio.  In  contrast,  the  MSD-optimal  method  considers 
the  average  squared  distortion  over  all  spectral  bands,  not  the  distortion  in  each  individual  spectral  band. 
Although  it  attempts  to  produce  the  smallest  RMSD  between  the  noncompressed  and  decompressed  data,  it 
may  do  so  at  the  expense  of  increased  distortion  in  some  bands. 

The  MSD-optimal  allocation  method  produced  the  smallest  RMSD  and  MAD  between  the  noncom¬ 
pressed  and  decompressed  data  at  all  compression  ratios.  In  addition,  this  method  produced  the  smallest 
MS  AD  at  lower  (<  200)  compression  ratios.  At  high  compression  ratios  (>  200),  the  MSD-optimal  method 
retained  only  3-5  PC  images,  and  the  equal-rate  method  slightly  outperformed  it. 

With  regard  to  NDVI,  the  MSD-optimal  allocation  method  yielded  smaller  errors  than  the  equal- 
rate  allocations  did.  For  CWV,  it  displayed  the  best  results  at  lower  compression  ratios  (50),  but  at  high 
compression  ratios  (200),  the  equal-rate  method  performed  better  when  a  sufficient  number  of  PC  images 
were  retained.  This  behavior  is  explained  by  recognizing  that  CWV  is  calculated  from  water-absorption 
bands,  which  have  low  signal  power  and  therefore  can  incur  significant  spectral  distortion  under  the  MSD- 
optimal  allocation  method  at  high  compression  ratios. 

For  classification,  the  MSD-optimal  method  consistently  produced  significantly  better  results  than 
equal-rate  allocations.  Remarkably,  compression  did  not  significantly  change  classification  results  even  at 
overall  compression  ratios  as  high  as  100-340.  We  note  that  the  spectral  library  was  derived  from  in-scene 
data  and  thus  represented  a  kind  of  “best-case”  library.  Also,  background  classification  is  perhaps  not  as 
demanding  as  some  other  HSI  applications.  Lossy  compression  and  background  classification  are  somewhat 
alike  in  the  sense  that  compression  tends  to  preserve  coarse  features  and  discard  fine  ones,  and  classification 
tends  to  classify  together  spectra  that  are  roughly  similar  but  might  differ  in  their  fine  details.  However,  for 
a  similar  compression  scheme,  Shen  and  Kasner  [26]  reported  “no  appreciable  degradation”  for  anomaly 
detection  and  material  identification  at  compression  ratios  up  to  48,  the  highest  they  tested. 

In  summary,  these  results  show  that  HSI  data  possesses  a  large  amount  of  redundancy,  which  can  be 
exploited  by  lossy  compression  to  produce  high  compression  ratios  without  significantly  degrading  appli- 
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cation  performance.  Rate  allocation  is  an  important  topic,  since  it  determines  how  the  limited  resource  of 
storage  or  transmission  bandwidth  is  consumed.  For  the  applications  tested,  the  MSD-optimal  allocation 
method  generally  outperformed  the  simple  equal-rate  method.  Finally,  statistical  tests  are  encouraged  as 
a  means  for  evaluating  the  effects  of  lossy  compression  on  application  results  and  protecting  against  an 
overcompression  error.  The  example  application  of  background  classification  proposed  two  such  tests. 

Some  avenues  for  future  research  follow  naturally  from  this  study.  Although  we  have  examined  some 
data  products  and  classification,  another  important  area  of  interest  is  the  effect  of  lossy  compression  on  the 
detection  of  targets  or  anomalies  in  hyperspectral  data.  This  topic  is  the  subject  of  another  ongoing  study. 
Regarding  rate  allocation,  the  methods  employed  here  are  both  standard  techniques  in  lossy  compression. 
An  area  of  high  potential  benefit  is  the  development  of  other  allocation  methods  that  consider  the  importance 
of  different  spectral  bands.  Statistical  tests  to  assess  effects  of  compression  on  applications  can  also  be 
developed  for  other  applications.  For  detection  applications,  tests  could  use  a  measure  such  as  the  area 
under  the  receiver  operating  characteristic  (ROC)  curve.  For  quantitative  data  products,  standard  tests  such 
as  the  t- test  [12,  Ch.  7]  could  be  employed. 


52 


A.  RATE  ALLOCATION 


Assume  there  are  N  independent  sources  with  respective  distortion-rate  functions  Dl{Ri ),  i  =  1,2, 
. . N.  Let  the  desired  rate  be  R^es.  For  the  allocation  R  =  (i?i,  R ,  R n),  the  average  distortion  is 

1  N 

D(R)  =  D(R1,R2,...,RN)  =  ~Y,D^ 

i=  1 


and  the  rate  is 

i=  1 

The  rate  allocation  problem  is  then 


l  N  N 

min  D( R)  =  min  —  V'  Di(Ri)  such  that  R(R)  =  —  V'  Rj  =  Rdes.  (41 ) 

R  {«,} 

1=1  t=l 


Equation  (41)  is  a  classic  constrained  optimization  problem,  which  can  be  solved  by  applying  the 
Lagrange  multiplier  method.  Construct  the  Lagrangian  cost  function  J(R), 


1 

N 


N 

y  ]  Ri  ~  R(ies  • 

t=l 


Setting  dJ/dRi  =  0  for  each  i  yields  the  equal-slope  or  Pareto  condition 


dDt 

dRi 


=  ~V 


Vi, 


(42) 


where  the  optimum  Lagrange  multiplier  value  rf  >  0  is  chosen  so  that  (l/iV)£^i  Rj  =  Rdes. 

For  a  scalar  Gaussian  source  with  variance  a2  and  MSD  distortion,  D(R.)  =  a2 2~2R  =  a2e~R,n4 
[5],  For  other  sources,  D(R)  is  often  approximated  as  D(R)  ss  Ca2e~'yR ,  R  >  0,  where  C  and  7  are 
positive  constants  [30].  In  the  remaining  discussion,  assume  that 


Di{Ri)  =  Cia2e~ltR ’ ,  Ri  >  0,  *  =  1, 2, . . . ,  N. 


(43) 


The  Pareto  condition  (42)  gives 


dDj 

OR, 


-CiCTjUe  7l  R‘  =  -r)* 


Solving  this  equation  for  /?„  gives  the  solution 


R*  =  max  /  —  In  ^'lGi  ^ ,  o\,  *  =  1,2 _ _  N. 

1 7*  V*  ) 


(44) 


Convexity  of  the  assumed  distortion-rate  functions  means  that  if  can  be  determined  efficiently  using  a 
bisection  search. 

The  associated  distortions  can  be  calculated  by  substituting  (44)  into  (43): 


Di(RS) 


if  hi,  if  R*  >  0; 
Ctf,  if  R*  =  0; 


i  =  1,2,. . .  ,N. 


Hence,  the  sources  that  are  assigned  a  positive  rate  have  distortions  that  are  proportional  to  7"1,  the  re¬ 
ciprocal  of  their  exponential  constant  7.  In  the  case  where  all  sources  have  the  same  exponential  constant 
7i  =  7,  V  i ,  the  sources  with  positive  rates  operate  at  the  same  distortion,  if  / 7. 
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